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Abstract 

Three classes of stochastic networks and their performance mea- 
sures are considered. These performance measures are defined as the 
expected value of some random variables and cannot normally be ob- 
tained analytically as functions of network parameters in a closed form. 
We give similar representations for the random variables to provide a 
useful way of analytical study of these functions and their gradients. 
The representations are used to obtain sufficient conditions for the 
gradient estimates to be unbiased. The conditions are rather general 
and usually met in simulation study of the stochastic networks. Ap- 
plications of the results are discussed and some practical algorithms of 
calculating unbiased estimates of the gradients are also presented. 

Key- Words: stochastic network, stochastic optimization, gradient 
estimation, perturbation analysis. 

1 Introduction 

Stochastic network models are widely used in modern engineering, manage- 
ment, biology, etc., to investigate real systems. These models are often so 
complicated that they can hardly be studied with the help of the analytical 
methods only. A more fruitful way is to use computer simulation to analyze 
the networks [HEIE]. By performing simulation experiments, one may get 
a large amount of information about the network behaviour. 

Usually, the main aim of the analysis is to improve a network perfor- 
mance. In order to optimize a performance criterion with respect to net- 
work parameters, one needs to evaluate it. Simulation provides estimating 
the criterion as well as its sensitivity (or its gradient, when the parameters 
are continuous) in a rather simple way. It is generally not difficult to obtain 
the estimates provided that there exists a simulation model, although each 
simulation experiment may be very time consuming. 
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There are many stochastic optimization procedures which use the data 
obtained by simulation (see [1] and also a short survey in [3|). In many 
cases, the procedures that exploit gradient estimates are preferred to those 
using estimates of the objective function only. Specifically, the stochastic 
algorithms which apply unbiased estimates of gradient are often highly effi- 
cient. As an example, one can compare the Robbins-Monro procedure with 
the Kiefer-Wolfowitz one. It is well known [4J that the first procedure based 
on the unbiased estimates of gradient, converges to the solution faster than 
the second one which approximates the gradient by the finite differences. 

In this paper, we analyze the problem of an unbiased estimation of the 
gradient of stochastic network performance measures. The paper is based 
on the results obtained in [5l|6]. In Section O we describe three classes of 
stochastic networks and give some examples of the networks and their related 
optimization problems. We show that the sample performance functions of 
the networks of all three classes may be represented in a similar way. In fact, 
these functions are expressed through those given by using the operations 
of maximum, minimum and addition. 

Section [3] includes technical results which provide a general representa- 
tion for the sample performance functions of the networks. 

In Section H] we briefly discuss three methods of estimating gradients, 
based on simulation data. 

The main results are presented in Section O First, we introduce a set of 
functions for which one may obtain unbiased estimates of their gradients. We 
prove some technical lemmae to state the properties of the set. In conclusion, 
we give the conditions that prove the gradient estimates to be unbiased. 
These conditions are rather general and usually fulfilled in simulation studies 
of the stochastic networks. 

In Section [6] some algorithms of calculating the gradient estimates are 
described. 

2 Stochastic Networks and Related Optimization 
Problems 

In this section we present three classes of stochastic networks and discuss 
their related optimization problems. A performance criterion of the network 
is normally defined as the expected value of a random variable, f{6, to) , i.e., 

Fie) = EM{e,u)]=E[fie,u;)], 

where E B C i?" is a set of decision parameters and u is a random vector 
representing the randomness of network behaviour. As a function of the 
parameters, f{9,u}) is usually called a sample performance function. 

The problem is to optimize the performance measure F{9) with respect 
to € 0. In practical problems, it is often very hard to evaluate the 



2 



expectation analytically in closed form, even if there is an analytical formula 
available for f{9, uj) . However, it is normally not difficult to obtain the value 
of f{6,Lo) for any fixed 9 G Q and any realization of lo by using simulation. 
In that case, one can use the Monte Carlo approach to estimate the objective 
function F{6) or its gradient. 

The main purpose of this section is to show that for many optimization 
problems, the sample performance function f{9,uj) may be represented in 
similar algebraic forms. In other words, f{9,uj) is expressed in terms of 
some given random variables by means of the operations max , min , and + . 
This representation offers the potential for analytical study of the estimates 
of performance measure gradients. It also provides a theoretical background 
for efficient algorithms of calculating the estimates. 

2.1 Activity Network 

We begin with stochastic activity network models widely used in corporate 

management in the scheduling of large projects. Consider a project consist- 
ing of some activities (or jobs) which must be done to complete it. Each 
activity is presumed to require a random amount of time for performing it. 
It is not permitted to begin each activity until some preliminary activities 
have been performed. One is normally interested in reducing the expected 
completion time of the whole project. 

In order to describe the project as a network, we define an oriented 
graph (N, A) , where N is the set of nodes and A is the set of arcs. Each 
node i € N represents the corresponding activity of the project. For some 
i,j € N, the arc belongs to A if and only if the ith activity must 

directly precede the jth one. 

To simplify further formulae we define the set of the father nodes as 
^^(i) = {j G G A}, and the set of the daughter nodes as 'Njj^i) = 

{j G N|(i,j) G A} for every i G N. In addition, we introduce the set 
of starting nodes N5 = {i € N|Ni?(i) = 0} and the set of the end nodes 
NE = {ie N|ND(i) = 0} of the graph. 

Now wc have to define the duration of the activities, so that the network 
would be completely described. Denote the duration of the ith. activity 
by Tj, i G N. We assume to be a positive random variable, such that 
Ti = Ti{9, 00) , where G 9 is a set of decision parameters and a; is a random 
vector which represents the random effects involved in realizing the project. 
The set T = {Ti\i G N} is presumed to be given. 

The sample completion time of the zth activity may be expressed in the 
form 




(1) 



3 



For the sample completion time of the whole project, we have 

t(0,uj) = max ti(9,uj). 

In that case, the expected completion time is 

ne) = E[t{e,u)], 

and the problem is to minimize T{6) with respect to ^ G 0. 

It is easy to see from ([T]) that one can represent t as a function of r € T 
by using the operations max and + . To illustrate this, consider the simple 
network depicted in Figure [TJ 




Figure 1: An activity network. 

For this network, applying ([1]) successively, we may write the sample 
completion time as 

t = Ti+ max{max{r2, Ts} + t^, T3 + t^} + tq. 

We will exploit the possibility of t being expressed in such a form in the 
discussion below. 

We conclude this example with the remark about the main difficulty of 
the activity network optimization problem. It is easy to understand that in 
the case of general random variables r G T , it is usually very difficult or even 
impossible to obtain the expected completion time analytically, even if the 
network is as simple as that in Figure [TJ To apply an optimization procedure 
in this situation, one normally estimates this function or its gradient by using 
the Monte Carlo approach. Notice, however, that the simulation models of 
such networks are generally rather simple. 
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2.2 Reliability Network 



Another class of stochastic network models arises from the reliability investi- 
gation of complex interconnected systems in engineering, military research, 
biology etc. Consider a system of elements having bounded random life- 
times. Each element keeps in order until either this element has failed or 
all those directly supplying it have lost their working conditions. The whole 
system is presumed to be in order if at least one of the main elements that 
are supplied by some others but do not supply any element, keeps work- 
ing. A usual problem in analyzing the system is to maximize its expected 
lifetime. 

Let (N, A) be the directed graph describing the relations between the 
system elements. In the graph, the set of nodes N corresponds to the 
set of system elements. If for some i,j £ N, the ith element directly 
supplies the jth one, then (i, j) G A. For the graph, we retain the notations 
Np{i),ND{i),'Ns , and N^; introduced above. 

For each element z G N, we define the lifetime as the random variable 
Ti{6,uj) which depends on the set of decision parameters 9 £ Q. Assume 
the set T = {tj} to be given. Now, we may represent the time for the ith 
element to be in order as 

^ (Q i ^^^T^{™-^^jeNp(i)tj{0,u}),Ti{9,uj)}, if i N5, 

The sample and expected lifetimes of the whole system may be written 

as 

t{e,uj) = maxti{e,uj) and T{e) = E[t{e,uj)], 
respectively. 

To illustrate this reliability network model, consider that depicted in 
Figure EKP]). 

For the sample lifetime of the system, we have from ([2]) 

t = min{Ti, max{min{r4, re}, min{max{T2, rs}, T5}}, ry}. 

We can see that the sample lifetime of such a network has one important 
property: it may be represented as a function of all r € T by using only 
the operations max and min. Note that the difficulties in solving the prob- 
lem of expected lifetime maximization are the same as in activity network 
optimization. 



2.3 Queueing Network 

Queueing networks provide a very rich class of stochastic network models. 
These models play the key role in simulation study of computer systems, 
communication networks, production lines, flexible manufacturing systems. 
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Figure 2: A reliability network. 

etc. In this part of the section, a general class of queueing networks is 
considered and several performance measures of the network arc defined. 

The network which we describe consists of L single-server nodes. There 
are a server and a buffer with infinite capacity in each node z, i = l,...,L. 
Once a customer arrives into node i, he occupies the server if it is free. 
The server keeps busy a random amount of time until the service of the 
customer has been completed. Upon the completion of its service at node i , 
the customer goes to node j , chosen according to some routing procedure 
described below. We suppose that the customer arrives immediately into 
node j. 

The customer may find the server of node i being busy. In that case, he 
joins the queue at the node and is placed into the buffer. The discipline in 
which queued customers are called forward for service is first-come, first- 
served. We assume that at the initial time, all servers of the network are free 
and there arc n (0 < n < oo) customers in the buffer at node i, i = 1, . . . , L. 
The customers are presumed to be of a single class. 

For the network, define the set of random variables T = {ry}, where 
Tij{6,uj) is the service time of the customer that is the jth to initiate a 
service at node i. These random variables depend on the set of decision 
parameters 9 G Q and a random vector uj , and they are presumed to be 
given data. 

Now, we discuss a routing mechanism of the network. A general routing 
procedure may be defined by means of the set of random variables, S = 
{cTij} , where CTij^to) represents the next node to be visited by the customer 
who is the j th to depart from node i . Let Sij be a realization of aij (to) for 
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a fixed u, Sij E {1, . . . , L} . The matrix 



S 



I Sii Sl2 ... \ 
S21 S22 ... 



is referred to as a routing tabic of the networlc. In addition, denote the set 
of all possible routing tables of the network by S . 

Firstly, we consider a special case of the network to demonstrate the 
relation between the queucing networks and those we have described above. 
Let us fix a routing table S" € S and consider the network with the deter- 
ministic routing procedure defined by S . One can state a lot of optimization 
problems of the network. The problems may differ in the performance cri- 
teria to be optimized. In order to produce useful representations of sample 
performance functions of the network, we introduce the following notations. 
For every node i , z = 1, . . . , L , let 

aij be the time of the jth arrival into the node, 
j5ij be the time of the jth initiation of a service, 
5ij be the time of the jth departure from the node. 

Obviously, for each node i we may analytically represent the relationship 
between these variables as follows 



It should be noted that in the above identities each a^j coincides with some 
5mk because the transition of customers from one node to another is imme- 
diate. 

One of the performance measures of the network is the expected value 
of the Mth service completion time at node K 



that we wish to minimize with respect to G O for given K and M . Some 
other performance criteria one usually choose to optimize will be defined 
below. 

The following fact is of great importance. For the network with a deter- 
ministic routing procedure, the sample completion time 5ij may be repre- 
sented as a function of r G T using the operations max, min, and -|- for 
every i,j provided that this service occurs. 

To illustrate this, consider an example of network with three nodes and 
the routing table 




6io = 0. 



(3) 



D{e) = E[5KM{e,uj)] 




2 113 
13 11 
2 3 12 
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Assume ni = n2 = = 1 and choose a node of the network, say node 
2. Since there is a customer at node 2 at the initial time, using ([3]) we may 
write 

"21=0, /321 = 0, 521=T21. 

It is easy to see from S that the customer who is the second to arrive 
into node 2 will be one of those first serviced at both nodes 1 and 3. It 
will be just that customer who completes his service earlier. In that case we 
have 

022 = min{rii,r3i}, /322 = max{a22, (52i}, S22 = (^22 + T22- 

Based on S , we also deduce that the other customer from these two will 
be the third to arrive into node 2. Therefore, we may write 

a23 = max{rii, T31}, /323 = max{a23, fe}, = P2?. + T23. 

Finally, for 82^ we get 

(523 = max{max{Tii, rsi}, max{min{rii, rsi}, T21} + r22} + T23. 

As we can see, there is the representation of the sample performance 
function of the network like that we have pointed out in the previous ex- 
amples. Notice that it may be very difficult to obtain such representations 
in practice. However, it is essential that the representation exists. We will 
give reason for this fact in the next section. 

Now we define some traditional performance measures of the queueing 
networks. These definitions are also extended to the general network with 
the stochastic routing procedures. 

Suppose that we observe the network until the Mth service completion 
at node K , for given K,M . As sample performance functions for node K 
in the observation period, we consider the following: 

t{6,uj) = Xljli {^Kj{0,uj) — aKj{0,uj)), the average total time per 
customer, 

w{6,uj) = jjJ2jLi {I^Kj{d,^) — O£Kj{0,i^)), the average waiting time 
per customer, 

u{9,uj) = J2jLi TKj{(^,^)/^KM{(^,^), the average utilization per unit 
time, 

c{6,io) = J2jLi {^Kj{d,^) — OiKj{(),^)) / ^km{(),^)-, the average number 
of customers per unit time, 

q{9,uj) = YljLi{(^Kj{0,u}) - aKj{0,u}))/6KM{d,^^), the average queue 
length per unit time. 

Denote the expected values of these sample functions with respect to w 
by T{6), W{e), U{e), C{e), and Q{e), respectively. They are the perfor- 
mance measures that one usually chooses to optimize the network. 
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Note that the sample performance functions are described in terms of 
elements of the set {a}, {5}, and {r}. It will be shown in the next 

section that each of the random variables a, f3 , and 5 is expressed by a func- 
tion of r G T, using the operations max, min, and +. This circumstance 
is of great importance and it will be necessary to study analytical properties 
of the performance measures and their gradient estimates in Section [5j 

For the general routing procedure defined by the set S , any of the above 
sample performance functions may be represented as 

f{9,u) = J2 l[EM=s]/5(e,^). (4) 
ses 

Here l[s(a;)=5] is the indicator of the event {S(a;) = S} , and fs{9,uj) is 
the sample performance function that coincides with f{9,uj) provided the 
routing procedure is deterministic and defined by the routing table S . 

It should be noted in conclusion, that the sample performance functions 
of the queueing network are more difficult to take their expectation analyt- 
ically than those of the networks we have considered above. 



3 Algebraic Representations for the Networks 

We have seen that the functions of network performance possess some al- 
gebraic properties. The point is that they may be expressed as a function 
of given random variables by means of the operations max, min, and +. 
For the activity networks and reliability, this follows directly from recur- 
sive equations ([TJ and ([2]) and does not require any special proofs. The 
possibility of such representations in describing of the sample performance 
functions of the queueing network is not obvious. In this section, we give 
the theorem that states the existence of the representation in the case of the 
network with a deterministic routing procedure. Two technical lemmae are 
also included in the section. 

In order to simplify further formulae, we introduce the notations V for 
maximum and A for minimum. In addition, we will use the sign \/ (/\) to 
represent an iterated maximum (minimum), i.e., 

n / n \ 

\J Xi = Xi\J . . .\J Xn [ l\xi = Xl^ . . . ^Xn \ ■ 

i=l \i=l J 

Let X be a set supplied with the operations +, V, and A. Without 
loss of generality, we may consider X to be a set of real numbers. It is easy 
to extend the result of this section to various sets of real-valued functions 
and random variables. We assume that the traditional algebraic axioms are 
fulfilled in X. In particular, we will use the following axioms. 
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Axiom 1. Distributivity of maximum over minimum. 

Vx, y, z G X, (x A y) V z = (x V z) A (y V z). 
Axiom 2. Distributivity of minimum over maximum. 

Vx, y, z G X, (x V y) A 2; = (x A z) V (y A z). 

Axiom 3. Distributivity of sum over maximum and minimum. 

Vx, y, zGX, (x Vy) + z = (x + z) V (y + (x Ay) + z = (x + z) A (y + 

The proof of the representation theorem for the queueing networks is 
based on the next result. Let X = {xi,...,x„} be a finite set of real 
numbers. Suppose that we arrange its elements in order of increase, and 
denote the fcth smallest element by x^^) . If there are elements of an equal 
value, we count them repeatedly in an arbitrary order. 

Lemma 1. For each k = 1, . . . ,n , the value of x^^) is given by 

/e3fe iei 

where Sfc is the set of all k subsets of the set N = {1, . . . ,n} . 

Proof. Denote the set of indices of the first k smallest elements by /* . It is 
clear that X(^p.^ = Vie/* Consider an arbitrary subset I G 9^. Obviously, 
if I ^ I* , there is at least one index j € I such that Xj > X(jt) . Therefore, 
we have X(fc-) < Vig/ ■ It remains to take minimum over all / G in the 
last inequality so as to get ([5]). □ 

Theorem 2. Let 5" G S be a fixed routing table. For the network with 
the deterministic routing procedure defined by S , every Oij , /3ij , and 6ij 
(i = 1, . . . , L; j = 1, 2, . . .) is represented as a function of t € T , using the 
operations max, min and +, provided that its associated service occurs. 

Sketch of the proof. Consider a network of L nodes with a routing table S . 
The recursive equations of the network are written as 

^ij = Pij + Tij , 

f3ij = aijV5ij-i, j = l,2, (5io = 0, 

for each node i, i = 1, . . . , L . The main idea of our proof is to reduce these 
equations to the one that expresses the departure time Sij through both 
the service time Tij and some other departure times Sf^^n (^k G {1, . . . 
m G {1, 2, . . .}) only. 

Let us first examine Oij , i.e. the arrival time of the customer who is the 
jth to come into node i. It is plain that this customer is one of those who 
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are to leave some other nodes to go to node i . Therefore, aij coincides with 
one of the departure times 6km such that Skm = i ■ 

Consider the customers who have to go to node i from some other, say 
node k. Clearly, we may restrict ourselves to the first j customers because 
this is enough to provide the j th customer to come into node i . Denote the 
set of the times at which customers depart from node k by Ak{i,j), k ^ i. 

It may happen that Sim = i for some m. This means that a customer 
who is the mth serviced at node i should join the queue of the same node 
again. In this case, we have to take account of such customers being served 
before the jth one only. The corresponding set of the departure times is 
^iihj) = {8i'm\sim = i and m < j} . 

Let A(i,j) = U^^]^Afc(z, j) . Note that if there are > customers 
in the buffer of node i at the initial time, then (y.i\ = • • • = o-in^ = 0. It 
is plain that for j > rij the arrival time aij coincides with the (j — nj)th 
smallest element of A(i,j). It follows from Lemma[T]that a^j is represented 
as a function of elements of the set A{i,j) by using the operations max and 
min. In addition, using ([6]) we may get such a representation for f3ij as a 
function of departure times S £ A(z, j) U {6ij^i}. 

Finally, it follows from ([6]) that there exists a representation of 5ij as 
the function of both Tij and the elements of the set A{i,j) U {6ij-i} which 
is a superposition of the operations max, min, and +. One can resolve this 
equation and get 6ij as a function of the service times r € T only. This 
produces the representation that the theorem requires. □ 

We conclude this section with the following technical lemma that offers 
a general form of the representation. 

Lemma 3. Let f{xi, . . . ,Xp) be a function of the variables xi, . . . ,Xp taking 
their values in (p is defined as a composition of the operations V, A, 
and + . Then if can be represented as 



where I and Ji for all i £ I are finite sets of indices, and all a^- are 
integers. 

Proof. Without loss of generality we suppose that there is no more than 
one entry of each variable xi,. . . ,Xp into the expression. If some variable 
has two or more entries, we introduce additional ones so that the above 
presupposition would be fulfilled. Let us prove the lemma by induction on 
the number of variables. 

For p = 1 , the statement of the lemma is obvious. If p = 2 , there are 
three possibilities 



p 




i&I j&.Ji k=l 



xi V X2, xi A X2 and xi + X2 
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and it is clear that the statement is also true. 

Assume that the statement of the lemma is true up to some value p—1- 
Consider an expression ip oi p variables. Clearly, there is an operation in 
the expression that should be performed after the other ones. Denote this 
operation by the asterisk *. In this case, we have ip = (pi*ip2 , where ipi and 
ip2 are expressions such that each of them cannot include all the variables 
xi, . . . ,Xp. By the assumption, the statement of the lemma holds for both 
ipi and ip2 ■ Now, we have three possibilities for the operation * . 

1. V. This is obvious. 

2. A. It is sufficient to apply Axiom [TJ 

3. +. To obtain the representation in this case, one has to apply succes- 
sively Axioms [H [2] and [3l 

Consequently, the statement of the lemma is true for ip = (pi * ■ D 

4 Estimates of Gradient 

To optimize the network performance measure F{6) = E[f{9,u})] , one often 
needs information about the gradients dF{6) / dO . In the absence of analyti- 
cal formulae for the gradient, Monte Carlo experiments may be performed to 
estimate its values. There are three general methods of estimating dF{0)/d9 
based on data obtained by simulation [H El [7]. In the first two methods, 
the gradient is approximated by the finite differences and then estimated by 
using the Monte Carlo approach. To illustrate these two methods, assume 
^ to be a scalar and consider the following estimates: 
The crude Monte Carlo (CMC) estimate: 

1 ^ 

GcMc = ]^ E (/(^ + '^^) - ^N+i)) ; 

i=l 

the common random number (CRN) estimate: 

1 ^ 

GcRN = j^^Y.^f{e + M,u:.^- f{0, u,)), 

i=l 

where , i = 1, . . . , 2N are independent realizations of the random vector 
Lo . The second estimate differs from the first in one respect: in the CRN 
estimate the random variables cjj are the same for both 9 + A9 and 9, 
whereas in the CMC estimate they are different. Note that each of them 
requires 2xN simulation runs ( at the original value 9 and N at 9 + A9 ) . 
Clearly, in the case of the vector 9 € , one must perform (n -|- 1) x 
simulation experiments to get each estimate. In |lj has shown that the 
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finite difference estimates have tlie mean square error (MSE) wliich is of 
order 0(iV-V3) q^^^^ 0{N-^l^) for Gcrn ■ 

We may somewhat improve the MSE properties of the estimate by us- 
ing more sophisticated finite difference formulae. However, the estimates 
become very expensive in terms of computation time because they require 
a large number of additional simulation experiments. For example, the fol- 
lowing symmetric difference estimate 

1 ^ 

i=l 

requires 2 x nx N simulation runs, when 9 € i?"). 

An estimate of the third method can be written in the form 

1 ^ f) 

i=l 

provided that the gradient of the sample performance function (sample gra- 
dient) exists. It should be noted that although we may obtain values of 
the sample performance function by simulation, it can be rather difficult to 
evaluate its gradient. 

Recently, a new technique called infinitesimal perturbation analysis (IPA) 
has been developed [2] as an efficient method of obtaining gradient infor- 
mation. The IPA method yields the exact values of the sample gradient 
df{9,uj)/d6 by performing one simulation run. The method is based on the 
analysis of the dynamics of the network and closely connected with the sim- 
ulation technique. Therefore, one can easily combine an IPA procedure for 
calculating the sample gradient with a suitable algorithm of network simu- 
lation. Such a procedure provides all the partial derivatives of the sample 
gradient simultaneously during one simulation run. Furthermore, it needs 
an additional computation cost which is usually very small compared with 
that required for the simulation run alone. 

The key question concerning the IPA method is whether it produces 
an unbiased estimate of the performance measure gradient. It can easily 
be shown that if df{9,uj)/d9 is an unbiased estimator of dF(9)/d9 then 
estimate ([7]) has MSE which is of order 0{N~^). In short, in the case of 
unbiasedness, this is a very efficient estimate which provides considerable 
savings in computation. 

In the next section, using the algebraic representation of Section [3l we 
will examine properties of the network performance functions so as to derive 
the conditions for estimate ([7]) to be unbiased. 
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5 A Theoretical Background of Unbiased Estima- 
tion 



It is easy to understand that a sufficient condition for the estimate d?]) of 
the gradient dE[f{6,oj)]/d6 at some 6* S G to be unbiased is 



(8) 



Cao showed in that ([8]) holds in the case of f{9,u)) being uniformly 
differentiable at 6 w.p. 1. Note that such a differentiability property is not 
easy to verify and hard to interpret for practical systems. A useful way to 
prove the interchange in ([8|) is to apply the Lebesgue dominated convergence 
theorem 0. We use this theorem in the following form. 

Theorem 4. Let (Jl, J^, P) be a probability space, Q C i?" and f : QxQ — > 

R be a -measurable function for any E and such that the following 
conditions hold: 

(i) for every 9 £@, there exists df{9,ijj)/d6 at 6 w.p. 1, 

(ii) for all 61,62 G ©, there is a random variable A(a;) defined on the 
same probability space, with E\ < 00 and such that 

\f{6uio)- f{e2,io)\<X{uj)\\ei-92\\ w.p. I. (9) 



Then equation holds on Q . 

As an important consequence, we may state that the function F(6) = 
E[f(6,uj)] is a Lipschitz one with a constant L = EX and continuously 
differentiable on Q, provided / satisfies the theorem conditions. 

Definition 1. A function f(9,u}) defined on the probability space {^},J^,P) 
at every 6 £ Q belongs to the set Vq^q (or simply V) if and only if it 
satisfies the conditions of Theorem^ 

Example 1. Random variables which arise from a simulation study of net- 
works, can be treated as members of a family of random variables [1]. There 
are few families one usually applies, namely the Exponential family, the 
Gaussian family, etc. Various random variables of a family may be obtained 
from the standard variable by using a suitable transformation. An ordinary 
way to transform random variables is based on changing the location and 
scale parameters. 

Let (^(w) be the standard random variable of a family. Define 

f{6,u;) = 6ia^) + 92, 
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where 6 = (6'i,6'2)'^ G 6 C i?^. Let us check whether it holds that f eV. 
Obviously, the partial derivatives of / with respect to 9i and 62 exist for 
almost all to and equal 

A/(0,a;)=eM and A/(^,,^) = l. 

In addition, it is easy to verify that / satisfies the condition (ii) of Theorem^] 
with A = 1^1 + 1. If E\S^ \ < 00, as is usually the case, then the conditions of 
Theorem H] are fulfilled for / and we have / G 2? . 

The next technical lemmae give the sufficient conditions for the arith- 
metic operations and the operation max and min not to break the main 
properties of the functions from T> . 

Lemma 5. Let f,g^T> and let Ai and A2 he the random variables that 
provide the condition (ii) of Theorem^for f and g, respectively. Let /xi,/i2 
and V he positive random variables. Then the following are satisfied 

(i) f + 9^V; 

(ii) If a is a hounded random variable, then af G T> ; 

(iii) // I/I < /ii and \g\ < ^2 hold w.p. 1 for any 6 G @ and E[Xifi2 + 
A2/ii] < 00, then fgGV; 

(iv) // I/I < Hi and \g\ > v hold w.p. 1 for any 9 £ Q and E[hi\2/v'^ + 
Xi/f] < 00, then f/g G T> . 

Proof. Clearly, f + g, af , fg and f/g are measurable functions of uj and 
differentiable ones on Q w.p. 1. Since for all of these functions the proofs 
of inequality ([9]) are quite similar, we verify it for only one of them. For 
instance, we examine h = fg. 
For all ^1 , 6*2 G we have 

\h{9i,uj)-h{e2,uj)\ 

= \f{9i,uj)g{9i,oj)-f{92,Lo)g{92,uj)\ 

= \f{9i,u)g{9i,co) - f{92,u;)g{9i,Lo) + f{92,u)g{9i,u) - f{92,u;)g{92,Lo) 

< \g{9i,u)\\f{9i,Lo) - f{92,u)\ + \f{e2,u;)\\g{9i,u)-g{e2,u;)\ 

< (Ai(w)^2(w) + A2(a;);Ui(tj))||6'i - 6*211 w.p. 1. 

In short, 

\h{9i,uj) - h{92,co)\ < X{uj)\\9i - 92\\ w.p. 1, 

where 

A = Ai/i2 + A2^i, EX = E[Xifi2 + A2^i] < 00. 
By Theorem U we conclude fgGV. □ 



15 



Notice, from Lemma [5] (i) and (ii) it follows that being closed for the 
operations of addition and multiplication by bounded random variables, T> 
is a linear space of functions with these two operations. 

Lemma 6. Let f,g ED. Suppose that for any 6q^6 , there exists a neigh- 
bourhood U^(0o) of 6q w.p. 1 such that one and only one of the following 
conditions 

(i) f{e,u:)=g{e,Lo), 

(ii) f{e,u)<g{e,u:), 

(iii) f{e,u)> g{e,u:) 

is satisfied for all 6 G Ua;(6'o) . 

Then f V g e V and f A g € V . 

Proof. Consider h{9,u;) = f{9,u}) V g{9,uj) . It is clear that h is measurable 
with respect to a; . In order to prove differentiability of h w.p. 1 on G , we 
examine an arbitrary 6 £ Q . There are only two possibility for h not to be 
differentiable. Firstly, it is possible that the derivative of h at 6 does not 
exist if at least one of the derivatives 



dfie,Lo) 



de 



and ^"C-"' 



de 



does not. In addition, h may not be differentiable at 6 if the maximum 
of the functions / and g changes over from / to at this point or vice 
versa. The last case is equivalent to that there exists u £ such that all 
the neighborhoods Uaj(^o) C @ contain both points at which 

f{9,u:)=g{0,u) and f{e,u)^g{e,Lo). 

By the assumption of the lemma, both of these cases may occur only with 
zero probability. Therefore, there exists dh{6,uj)/d6\e=9f) at all € 
w.p. 1. 

For the function h , the proof will be completed if we show that h satisfies 
condition (ii) of Theorem SI Since f,g£T>, there are random variables Ai 
and A2 with E\i < 00 and EX2 < 00 such that the inequalities 

\f{6i,uj)-f{e2,uj)\ < Xi{io)\\ei - 6211 w.p. 1 

\g{ei,uj) - g{e2,uj)\ < A2(t^) 116*1 - 6*211 w.p. 1 

hold for all 61,62 € &■ Let lo be an arbitrary element of at which both 
these inequalities hold. Divide into two subsets: 

= {9 ee\f{e,uj)>g{e,uj)}, 
Yo, = {ee@\f{e,Lo)<g{e,uj)}. 
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Obviously, it holds 

\h{ei,u)-h{e2^uj)\<\i{u)\\ei-e2\\ 

for all 6*1 , ^2 £ and 

\h{ei,uj)-h{e2,uj)\<\2H\\ei-e2\\ 

for all 6*1, 6*2 G Y^. Assume Oi G X^,6'2 G Y^. If h{9i,u}) > h{92,uj), we 
deduce 

|/i(0i,w)-/i(e2,u;)| 

= 1/(^1,^) -5(^2,^)1 

< \f{9uu^)-f{e2,io)\<\i{u;)\\9i-92\\. 
Similarly, if h{9i,u) < h{92,u}), we have 

\h{9i,io) - h{92,io)\ < X2{co)\\9i - 92\\. 

It follows that 

\h{9i,uj) - h{92,oj)\ < Xiu;)\\9i - 92\\, AH = Ai(6^) V A2H, 

for all 9i,92 G . Since this inequality holds for almost all w G , we 
conclude that 

\h{9i,u;) - h{e2,oj)\ < X{oj)\\9i - 92\\ w.p. 1, 

and EX = E[Xi V A2] < EXi + -E;A2 < 00. 

In other words, h satisfies the conditions of Theorem HI Consequently, 
f y g &V . The proof of the statement / A 5 G P, is analogous. □ 

It should be noted that the condition of Lemma [6] is not necessary, as 
the next example shows. 

Example 2. Let Q = [—1,1], {0,,J^,P) be a probability space, where 
= [0,1], T is the cr -field of Borel sets of and P is the Lebesgue 
measure on 17. Consider the following functions: 

f{9,io) = -9^ + UJ, g{9,Lo) = 9^ + UJ 

and 

h(o \ f(a (a \ / -^^ + ^' if-l<0<O, 

One can easily verify that for any neighbourhood of ^ = 0, there exist 
both points with f > g and f < g w.p. 1. The conditions of Lemma [6] 
are therefore violated. Nevertheless, h is differentiable at for all a; G fi. 
Moreover, it holds that /i G P. 
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Corollary 7. Let f,g € V . If for every 9 ^ Q it holds that f ^ g w.p. 1, 
then fygeV and f Ag eV. 

Proof. Clearly, the condition of the corollary implies that either f — g > or 
f — g < holds at every 9 G Q w.p. 1. Since f,gGV, these two functions 
are continuous functions of 9 w.p. 1 as well as f — g- Because of continuity, 
f — g>0{f — g<0) holds w.p. 1 not only at 9 , but also at every points 
of a neighbourhood of 9 . It remains to apply Lemma [6l □ 

Using Corollary [71 we give the following general conditions for T> to 
provide closeness with respect to the operations V and A . 

Lemma 8. Let f,g & V . If for any 9 G Q it holds that the random variables 
f{9,uj) and g{9,uj) 

(i) are independent, 

(ii) at least one of them is continuous, 

then fygeV and f Ag eV. 

To prove the lemma it is sufficient to see that its conditions lead to that 
of Corollary [71 

The next two examples show that both conditions of Lemma [HI are es- 
sential. 

Example 3. Let {^},T,P) and B be defined as in Example [51 Also define 

f{9,u}) = -9 + uj and g{9,uj) = 9 + oj. 
Let us consider the function 



It is clear that f,g^T> and for every ^ € 0, the random variables f{9,u)) 
and g{9,uj) are continuous. Although inequality ([H) holds with A = 1 for h, 
this function is not differentiable at = for all a; € ^2. Therefore, h ^T>. 

Example 4. Let 9 = [0, 1] , $7i = $^2 = [0, 1] and P be the Lebesgue 
measure on = J^i x • Denote uj = {uji,uj2)^ and consider the following 
functions: 



h{9,Lo) = f{9,u)Vg{9,uj) 



-9 + uj, if-l<6'<0 
9 + uj, ifO<6'<l. 




and 



hi9,u;) = f{9,uj)\/ g{9,uj) 



{ 



max{ 2 
1, 



^9, 6'^}, if < I and uj2 < 



1 

2' 



otherwise. 
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One can see that f,g&T> and for every G , the random variables f{6,u) 
and g{6,u}) are independent. In addition, condition (ii) of Theorem [5] holds 
for h with A = 2. Nevertheless, h = msx{^6,6^} with probability |, that 
is not a differentiable function at 9 = ^. In that case, h ^ V. 

Lemma 9. Let Ai be a set of functions from T> such that for any f,g(zM, 
the conditions of Lemma\^ are fulfilled. Then M. is closed for the operations 
max and min. 

Proof. Let f,g^A4 and let us define h = f V g. Note that /i € D by 
Lemma [6l We have to prove the conditions of Lemma [6] are satisfied for h 
and any u G ^A. 

If u is either f or g , say u = f , we may write 



h-u=fyg-f 



9- f, f < 9, 
0, if / > g. 



Since f,g(zA4, for any point of , there is a neighbourhood on which only 
one of the conditions f — g < 0, f — g = 0, or f — g > holds w.p. 1. 
From the above identity this also holds for h — u on the neighbourhood. 
Consequently, in this case the conditions of Lemma O are fulfilled. 
Now we assume u G A4\ {/, g} . We have 



h — u = f\/g — u 



g- f, f < g, 
f -u, ii f >g. 



Let us examine any 6* G ©. Suppose that f < g w.p. 1 at ^. Since f,g, 
and u belong to A4, there are neighborhoods \Ji^{9) and V(^(0) where the 
conditions of Lemma [6] are fulfilled for each pairs of functions (/, g) and 
{g, u) , respectively. It follows from the above expression that the neighbor- 
hood U^jPI V^(6') is that Lemma [6] requires for h and u. If it holds that 
f > g or f = g at 9 , the reasoning is the same. 

In short, we have shown that the conditions of Lemma [6] are satisfied for 
h and any u & A4 and, therefore, h = f M g ^ Ai . In the case of minimum, 
the proof is analogous. □ 

Corollary 10. If fj & A4 for every j = 1, . . . ,N , then it holds 

V A e M, 

i€i jeJi 

where I is a finite set of indices and Ji C {1, . . . , N} for every i & I . 

This is an immediate consequence of the previous lemma. 

The next example is of importance to the main result of the section. 
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Example 5. Let fj G D for all j = 1, . . . , N . Suppose that at every € 0, 
all the random variables fj{9,uj) are continuous and independent. Define 
£ to be a set of linear combinations Yliei^ifi with integer coefficients Oj, 
i € / C {1, . . . ,N}. Obviously, £ is stable for addition. For all functions 

u = Yjiei "-ifi ^"^^ ^ = ^jeJ ^jfj ' ^^'^^ u-v = Ylk€K (^kfk ■ It is clear 
that for every 6 £ Q, u — v is a. continuous random variable because of 
the properties of / (except for the case of all = which is obvious). 
Therefore, it holds that u — v ^ w.p. 1 at every G . Similarly as in 
Corollary [3 one can deduce that u and v satisfy the conditions of Lemma O 
From this we conclude that £ may be treated as an example of M . 

One can easily see that the condition of continuity is essential to this 
reasoning. To illustrate the important role of independence, consider the 
following functions 

f{d,Lo) = -20 + 2uj, g{d,u;) = 9 -co, and u{e,uj) = 9 + uj, 

under the same assumption as in Example [3l It is easy to verify that the 
conditions of Lemma [6] are fulfilled for any two functions of them. Never- 
theless, the functions u and v = f + g do not satisfy the conditions, as 
Example [3] has shown. 

Now, we may formulate the main result of the section. We first introduce 
some definitions. Let A be the algebra of all functions f : Q x ^} — > R 
being defined on the probability space (fi, J^, P) at every 9 £ Q with the 
operations V, A, and +. In other words, this is a closed system of the 
functions for these operations. 

Definition 2. Let T he a finite subset of functions of A. We define [T]_4 
to he the set generated by T in A, that is the set of all functions being 
obtained from ones of T by means of the operations V , A , and + . 

Theorem 11. Let T € D. Suppose that for all r € T, t{9,uj) are contin- 
uous and independent random variables at any 9 £ Q . 
Then it holds [T]^ C P. 

Proof. It results from Lemma [3] that every / E [T]^ can be represented as 

iei jeJi reT 

where all ajj are integers. It has been shown in Example [5] that the func- 
tions of the family {J2TeT '^k'^}k=i,2,... satisfy the conditions of Lemma El 
Applying Corollary [TOl we conclude that the statement of the theorem is 
true. □ 

It is important to note that the conditions of Theorem [11] are rather 
general and usually fulfilled in the network simulation. In particular, in 
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contrast with the traditional approaches (cf., for example, existing results 
on the unbiasedness of IPA estimates in [2^, 'S] ) , we may not restrict ourselves 
to the exponential distribution. 

In short, to satisfy the theorem only the following are required for the 
functions of the set T : 

(i) for any € 0, all r € T are continuous and independent random 
variables; 

(ii) each r € T as a function of 9 is differentiable w.p. 1 and Lipschitz 
one with an integrable random variable as a Lipschitz constant. 

In the next section we will show how these results can be applied to some 
problems to verify the unbiasedness of gradient estimates. 

6 Applications 

Now we discuss the applications of the previous results to optimizing the 
networks. In particular, we describe algorithms of obtaining sample gradi- 
ents, based on the algebraic representation of the networks. In this section 
we keep using the notations {i},J-,P) and for the underlying probability 
space and the parameter space, respectively. 

We begin with the stochastic activity network. Let the duration of the 
jth activity be represented by the function Tj{6,u}). Denote the set of all 
such functions of the network by T . As we have seen, a sample completion 
time of the network t{6,uj) may be expressed by functions of T by using 
only the operations max and +. This implies t € [T]_4. 

Suppose that T G D, and all r G T are continuous and independent 
random variables at every 9 (z Q . For the mean completion time T{9) = 
E[t{9,uj)], it follows from Theorem [n that (1/A^) Y.'^=idt{^^^i)/d9 , where 
cjj G 0, is an unbiased estimate of the gradient dT{9)/d9 . 

As an example, suppose t{9,uj) = — 01n(l — w), where 9 £ R and the 
random variable uj is uniformly distributed on [0,1]. It is well known [1] 
that — ln(l — uj) has an exponential distribution with mean 1. Similarly as 
in Example [H we have t €z V . In addition, durations of the activities are 
normally considered as independent in the probabilistic sense. Our results 
are therefore applicable in this case. 

Now suppose that there is a simulation procedure for the activity network 
with L nodes to provide a simulation experiment for any fixed 9 £ Q and 
a realization of u . One can easily combine it with the following algorithm. 

Algorithm 1 

Step (i). At the initial time, fix values of 9 and uj; set gj = for j = 
1, . . . , L, and set c = 0. 
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Step (ii) . Upon the completion of any activity i , add the value of dri {O^u)/ 86 
to Qi and add 1 to c; if c = L, then save gi as the value of dt{6 ,oj) / dO 
and stop; otherwise go to Step (iii). 

Step (iii). Determine the set N£)(i). For every j € Ni5(i), if all activities 
of the set Nir(j) have been completed, then set gj = gi. 

To verify the correctness of Algorithm 1, it suffices to see that it is simply 
based on recursive equation ([1]). 

For a reliability network, one can apply Theorem [11] in a similar way. 
As in Section [3l denote the sample lifetime of a system by t{6,u}). It is not 
difficult to construct the next algorithm that calculates the sample gradient 
dt{9,uj)/de. 

Algorithm 2 

Step (i). At the initial time, fix values of 9 and oj . 

Step (ii). Upon the failure of element i, exclude all nodes representing the 
elements that are now not able to keep working from the set N as well 
as the corresponding arcs from the set A. 

Step (iii). If for the reduced set N it holds NnN^; = 0, then save dTi{e,uj)/d9 
as the value of dt{9,uj)/d9 and stop; otherwise go to Step (ii). 

Finally, we consider the queueing network which is a rather complicated 
model. Applying Theorem [11] to a network with a deterministic routing 
mechanism, we may conclude that 

(i) if T e P, and for all r € T, t{9,uj) are continuous and independent 
random variables for every G 0, then the estimate ([7|) is unbiased 
for both the expected average total time T and the expected average 
waiting time W; 

(ii) if in addition to previous assumptions, for all ti,T2 € T, condition 
(iv) of Lemma [5] is fulfilled, then the estimate ([7]) is unbiased for the 
expected average utilization U , the expected average number of cus- 
tomers C and the expected average queue length Q . 

In the case of the stochastic routing mechanism with a random routing 
table, the above conclusions still hold true. This follows from representation 
Q and Lemma [5] (ii) because of the boundedness of the indicator random 
variable. 

In order to construct a useful algorithm of calculating a sample perfor- 
mance function gradient, we first consider the identities at ([3]) in the form 

5ij{9,uj) = niax{aij{9,u;),5ij-i{9,u;)} + Tij{9,uj). 
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Differentiating the function 6ij{6,u}) for a fixed u; G at 6 , we have 

(fj \^ / + Oiij{9,uj) < 6ij-i{e,uj), 

The inequahties in the conditions of the right-hand side mean the following. 
The inequality Oij > (Jij-i implies that at the arrival of the jth customer 
into node i, the service of the previous one, the (j — l)th, has been com- 
pleted. Therefore, at that time the server is free. The meaning of the 
contrary inequality is that the server is busy at the arrival of the jth cus- 
tomer. 

In the first case, the value of d5ij{9,uj)/d6 is defined by daij {6 , u) / dO . 
Note that the function aij{9,uj) = dkm{6,^), where 5km{G-,^) represents 
some mth completion time at a node k. It is the service completion of 
the customer that is the jth to arrive into node i. In other words, in that 
case one have to add the value of d5km{9,uj)/d9 to drij {9 , co) / 89 to obtain 
d6ij{9,uj)/d9. 

If it holds Oij < 6ij-i, then the value of d5ij {9 , oj) / 89 is calculated by 
addition d6ij-i{9,uj)/d9 and dTij{9,uj)/d9 . 

It results from Section [5] that if for any ^ S , all r € T are continu- 
ous and independent random variables, then aij{9,uj) ^ 6ij-i{9,uj) w.p. 1. 
Therefore, the above expression defines the sample gradient w.p. 1. 

Now, we consider an algorithm that provide the value of d8KM{9-,^)/d9 
for fixed 9eQcR, uj^^.Ke {1,...,L}, and M G {1,2,...}. We 
suppose that there is a simulation procedure into which the algorithm may 
be incorporated. 

Algorithm 3 

Step (i). At the initial time, fix values of 9 and uj] set gj = for j = 
1,...,L. 

Step (ii). Upon the jth completion at node i , add the value of dTij {9, uj)/d9 
to Qi] if both i = K, and j = M, then save gx as the value of 
dSKM{9,uj)/d9 and stop; otherwise go to Step (iii). 

Step (iii). Determine the next node r = aij{uj) to be visited by the cus- 
tomer; if the server of node r is free, then set gr = gi- 

Note that in the case of a vector of parameters, C i?" , the algorithm 
is analogous. It only needs to change gi for the vector = {gn, . . . ,gin)~^ 
and to treat the arithmetic operations as the vector ones. 

It is easy to see that the algorithm of evaluating 6ij{9,uj)/d9 plays 
the key role in calculating gradients of sample performance functions of 
the network. It is included as the main part in other algorithms. To il- 
lustrate this, we consider an algorithm for the sample function u{9,u;) = 
YljLi'''Kj{9,uj)/6KM{9,uj) , the average utilization per unit time. 
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Algorithm 4 

Step (i). At the initial time, fix values of 6 and uj; set t,d = 0, and gj = 
for j = 1,... ,L. 

Step (ii). Upon the jth. completion at node i, add the value of drij {6 , uj) / 89 
to Qi] ii i = K , then add the value of drij {9 , uj) / 86 to d, and add 
the value of TKji9,uj) to t; if both i = K , and j = M , then set 
h = 6km{9,uj) and stop; otherwise go to Step (iii). 

Step (iii). Determine the next node r = aij{uj) to be visited by the cus- 
tomer; if the server at node r is free, then set Qr = Qi- 

Upon the completion of the algorithm we get {dh — tgK)/h'^ as the value 
of 8u{9,uj)/89. 

It is easy to see that Algorithms 3 and 4 are quite similar to those of 
IPA method in [3]. 

In conclusion, note that the algorithms are rather simple. In fact, they 
only require calculating gradients of given functions and performing some 
trivial operations to produce values of the sample gradients. Using these val- 
ues, one can easily estimate the gradients of network performance measures 
so as to apply efficient optimization procedures. 
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